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Method and device for determining the quality of a signal . 



A. BACKGROUND OF THE INVENTION 

The invention lies in the area of quality measurement 
of sound signals, such as audio and voice signals. More 
in particular, it relates to a method and a device for 
determining, according to an objective measurement 
technique, the quality of an output signal from a signal- 
processing system with respect to a reference signal 
according to the preamble of claim 1 and claim 7, 
respectively. A method and a device of such type are 
known, e.g., from References [l,-,6] (for more 
bibliographic details on the References, see below under 
C. References). According to the present known technique, 
an output signal from an audio or voice signals-processing 
and/or transporting system, whose signal quality is to be 
determined, and a reference signal, are mapped on 
representation signals according to a psycho-physical 
perception model of the human hearing. As a reference 
signal, an input signal of the system applied with the 
output signal obtained may be used, as in References [1,- 
,5]. But as a reference signal such as, e.g., disclosed 
in Reference [6], there may also be applied an estimate of 
the original input signal, reconstructed from the output 
signal. Subsequently, a differential signal is determined 
as a function of time from said representation signals, 
which, according to the model used, is representative of a 
disturbance sustained in the system present in the output 
signal. The time-dependent differential signal, 
hereinafter also referred to as a disturbance signal, may 
be a difference signal or a ratio signal, or also a 
combination of both, and constitutes a time-dependent 
expression for the extent to which, according to the 
representation model, the output signal deviates from the 
reference signal. Finally, the disturbance signal is 
averaged over time, a time-independent quality signal 
being obtained, which is a measure of the quality of the 
auditive perception of the output signal. 
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It is a known phenomenon that, when listening to an 
audio signal, a short disturbance therein already has a 
significant influence on the quality perception of the 
entire signal. This applies not only to spoken words and 
5 music, but in general for the reproduction of sound 

signals. Upon application of the customary linear time 
averaging, in such cases there is a poor correlation 
between human quality perception and the quality signal 
obtained by way of the measurement technique. Application 
10 of the "root mean square" as a time-averaging function 

admittedly provides some improvement, but even then the 
correlation is still too low for a good operation of the 
objective method. 

15 B. SUMMARY OF THE INVENTION 

The object of the invention is, inter alia, to 
provide for a method and a device of the above type, with 
which a high correlation may be achieved between the human 
quality perception of an output signal and a quality 
20 signal obtained by way of the measurement technique, 

particularly in cases where the above phenomenon occurs . 
Considerations on which the invention is based,- are the 
following. The linear time averaging referred to above 
and the "root mean square" are actually special cases of 
25 the Lebesgue p-averaging function or Lebesgue p-norm (Lp 

norm), for p=l and p=2, respectively. For this norm 
function it applies that for an increasing p the value of 
the norm ever more approaches the maximum of the function 
f within the interval. The effect of applying the Lp norm 
as an averaging function on the disturbance signal is 
therefore that, in the event of an increasing p, the 
higher signal values of the disturbance signal over the 
averaging interval are counted ever more dominantly in the 
averaging result. 

In the present quality-measurement technique, it is 
customary to use test signals of spoken sentences 
comprising two sentences or parts thereof and taking 
approx. 10 seconds. Here, it may be recognised that, in 
the event of spoken words, a syllable (having an average 
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duration of approx. 0.3 s) is not intelligible when, in 
the voice signal, part of the syllable is disturbed. This 
signifies that in a disturbance signal comprising a signal 
part forming a representation of a disturbance signal of 
such a disturbed syllable, such signal part may be locally 
replaced by an averaged signal value which exceeds a 
signal value obtained by way of a linear averaging, in 
order to extract information relevant to the determination 
of the quality. Said higher average signal value may be 
obtained, e.g., by applying an Lp norm having a relatively 
high p-value on said signal part. At the sentence level, 
however, a second sentence or part thereof continues to be 
intelligible, if only the intelligibility of a preceding 
first sentence or part thereof is affected by disturbance, 
in such a manner that for time averaging an averaging 
function may be applied corresponding to, or at least 
deviating less from, the linear averaging, such as, e.g., 
an-Lp norm having a relatively low p, e.g., p=l or p-2. 

The invention idea proper, which is also applicable 
more in ..general to arbitrary audio signals, now includes 
the application, instead of the known singular time 
averaging, a dual or 2-stage time averaging. Said 2-stage 
time averaging comprises two substeps: a first substep in 
which the time-dependent disturbance signal obtained in 
the combination step is subjected, first at the local 
level, i.e., over relatively small time intervals, to a 
first averaging function, an average value being obtained 
per the first time interval; and a second substep in which 
average values obtained in the first substep are subjected 
30 to a second averaging function over the entire signal 

duration. The first averaging function differs from the 
second averaging function and therewith deviates more 
strongly from the linear averaging than the second 
averaging function. 

According to the invention, the method and the device 
of the above kind therefore have the characteristic of 
claim 1, and the characteristic of claim 6, respectively. 

In first preferred embodiments of the method and the 
device, averaging functions are applied which are based on 
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an Lp norm, namely, in the first substep an Lp norm having 
a relatively high p-vcilue, and in the second substep an Lp 
norm having a relatively low p-value. For this purpose, 
the method and the device are preferably characterised 
according to claims 3 and 7, respectively. 

Further preferred embodiments of the method and the 
device according to the invention are summarised in the 
subclaims. 

C. REFERENCES 

[1] Beerends J.G., Stemerdink J. A., "A perceptual audio 
quality meassure based on a psychoacoustic sound 
representation", J. Audio Eng. Soc, Vol. 40, No. 12, 
Dec. 1992, pp. 963-978; 

[2] WO-A-96/28950, 

[3] WO-A-96/28952 

[4] WO-A-96/28953, 

15] WO-A-97/44779, 

[6] WO-A-96/06496. 

All References are considered as being incorporated 

into the present application. 

D. BRIEF DESCRIPTION OF THE DRAWING 

The invention will be set forth in further detail by 
way of a description of an exemplary embodiment, reference 
being made to a drawing comprising the following figures: 
FIG. 1 schematically shows a known device for 

determining the quality of a sound signal; 
FIG. 2 shows, in parts (a), (b) and (c) , graphic 
representations for the benefit of the 
explanation of the time-averaging step in the 
method according to the invention: in part (a), 
a graphic representation having an example of a 
disturbance signal as a function of time, broken 
down into subsignals per interval; in part (b) , 
a graphic representation of average signal 
values of the subsignals per interval obtained 
in a first substep of the time-averaging step; 
and in part (c), a graphic representation of 
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several quality-signal values obtained in a 
second substep of the time-averaging step; 
FIG. 3 schematically shows a time-averaging device 
modified according to the invention for 
application in a device according to FIG. 1. 

E. DESCRIPTION OF AN EXEMPLARY EMBODIMENT 

FIG. 1 schematically shows a known measurement device 
for determining the quality of a sound signal. The 
measurement device comprises a signal processor 10 having 
signal inputs 11 and 12, and having signal outputs 
coupled, by way of signal couplings 13 and 14, to signal 
inputs of a combining device 15. The combining device 15 
is provided with a signal output which, by way of a signal 
coupling 16, is coupled to a signal input of a time- 
averaging device 17. The time-averaging device 17 is 
provided with a signal output 18 which in addition forms 
the output of the measurement device. 

Said known measurement device roughly operates as 
follows- On the signal inputs 11 and 12 of the signal 
processor 10, an input signal X(t), of which the signal 
quality is to be determined, and a reference signal Y(t), 
respectively, are offered. The input signal X(t) is an 
output signal of an audio or voice signals-processing 
and/or -transporting system (not shown) , whose signal- 
processing and/or -transporting quality is to be 
investigated. The signal processor 10 processes the 
signals X(t) and Y(t), and. generates representation 
signals R(X) and R(Y) which form representations of the 
signals offered X(t) and Y(t) according to a perception 
model of the human hearing laid down in (the hardware 
and/or software of) the signal processor. In most cases, 
the representation signals are functions of time and 
frequency (Hz scale or Bark scale) . The representation 
35 signals R(X) and R(Y) are passed through, by the signal 

processor 10 by way of the signal couplings 13 and 14, 
respectively, to the combining device 15. In the 
combining device 15, under the execution of various 
operations on the representation signals, such as 
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comparison, scaling, determination of a ratio signal or an 
absolute-difference signal, and integration over the 
frequency, a time-dependent disturbance signal D(t) is 
generated, which is offered to the time-averaging device 
17 by way of the signal coupling 16. In the time- 
averaging device, the disturbance signal D(t) is averaged 
over time by carrying out an integration according to time 
over the duration in time of the signal, the result of 
said time averaging becoming available, as a quality 
signal Q, at the signal output 18 of the time-averaging 
device. The time-independent quality signal Q constitutes 
a measure for the quality of the auditive perception of 
the signal X(t). As a time averaging, the linear time 
averaging is customary, i.e., the integration of the 
15 disturbance signal D(t) over time, divided by the total 

time duration of the signal (see, e.g.. Appendix F of 
Reference [1], pp. 977/8). By such a time averaging, 
however, brief disturbances in a sound signal, which may 
have a significant effect on the quality perception of the 
entire signal, are averaged out. In cases taking place, 
such may result in a poor correlation between the human 
quality perception and the quality signal obtained by way 
of the measurement technique. In the event of applying 
the "root mean square" as a time-averaging function, a 
correlation is obtained, which is still too low for a 
sound operation of the objective method. 

The linear time averaging and the "root mean square" 
are actually specific cases of the Lebesgue p-averaging 
function or Lebesgue p-norm (Lp norm) : 



^p(/)=II/1=(m)J|/(>")P/^) {1} 



for a function f integrable over a specific interval (a, 
b) having a measure \x, and: 
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for a function f defined in n discrete points xi (i=l,-,n) 
in the interval (a, b) , for p=l and p=2, respectively. 
For said norm, it applies that for increasing p, the value 
of the norm ever more approaches the maximum f^ax of the 
function f within the interval, and that in the limit for 
p->oo, it applies that L«(f)=f„u.x. The effect of applying 
such a norm function as an averaging function on (part of) 
a disturbance signal therefore is that, for increasing p, 
the higher signal values of the disturbance signal over 
the averaging interval are ever more dominantly counted in 
the averaging result. In the Lp norm generally it applies 
that p G 9?. However in the context of the present 
invention p e is more sensible. 

In order to prevent averaging out the influence of 
relatively brief disturbances in the final quality signal, 
the timi-averaging step is carried out in two substeps, 
which are explained with reference to FIG. 2. In said two 
substeps, two different averaging functions are applied to 
the disturbance signal one after the other, which are 
chosen in such a manner that the first averaging function 
in the first substep has higher (signal) values of the 
disturbance signal over an averaging interval more 
dominantly counted in the averaging result than the second 
averaging function. In general, such pairs of averaging 
functions may be determined by individual selection, e.g., 
using simulation. When applying the Lp norm as an 
averaging function, it is only required in the first 
substep to choose an Lp norm having a p-value which is, 
e.g., a number of times larger than the p-value of the Lp 
norm applied in the second substep. Since the Lp norm is 
based on a specific form of convex functions, namely, the 
function g(x) = |xr for p=l,2,..., having as its inverse 
function g-'(x) = W"\ it may be expected that in the general 
class of convex functions other suitable pairs may be 
found. The following, more general forms of the formulas 
(1) and (2) for averaging function or norm are associated: 
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'■M)'l/l.-H..JM4>"') (la) 

and 

i.(/) = HI.=«( S^W-i (2a) 

Suitable functions, on which the averaging functions in 
the first and the second averaging steps may be based, 
are, e.g., ^, (x) = exp(px) with p=l,2,.., having as its 
inverse function g;\x) = p-'\T,{x) in the first step, in 
combination with in the second step g^{x)=\x\ of g:,{x) = \K^ 
It should be noted therefore that, although in the further 
description for simplicity's sake use is made only of the 
Lp norm as an averaging function, this does not signify 
that the invention is limited to this purpose. 

In part (a) of FIG. 2, an example is offered of a 
disturbance signal D(t) as a function of time, the time 
being plotted along the horizontal axis and (the intensity 
of) the signal D(t) being plotted along the vertical axis. 
In a first substep, the total time duration Ttot of the 
signal D(t) is first broken down into n intervals Ti 
(i=l,-,n) of preferably equal time duration Tmt, and the 
signal D(t) proper broken down into signal parts having a 
signal part Di(t) per interval Ti. Subsequently, in each 
interval Ti (i=l,-,n) a time average is determined 
according to the Lp norm (see formula {1}) of the signal 
part Di(t) at a first, relatively high p-value pi (e.g., 
Pi=6) . In this connection, it should be noted, that only 
by way of example the disturbance signal D(t) has been 
represented as a continuous function. It is customary 
that the signal D(t) becomes available as a time-discrete 
function at the output of the combining device 15 in the 
form of a time-sequential row of values, e.g., twenty per 
time interval, which may be interpreted as sampling points 
of a continuous function. In this case, the Lp norm is 
determined using formula {2}. The values of the time 
averages, Lpi(Di) for i=l,-,n, are represented for each 
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interval Ti in part (b) of FIG. 2, designated by a short 
horizontal dash 21. For comparison's sake, in each 
interval the values are also represented of the time 
averages for pi=l and pi=oo, i.e., Li(Di) and Loo(Di), 
5 respectively designated by a long horizontal dash 22 and 

by a dot 23. 

In a second substep, the Lp norm of the values Lpi(Di) 
determined per interval Ti over the total time duration 
Ttot according to formula {2} with a second, relatively low 
10 p-value P2<pi (e.g., P2=l or 2), which results in the 

quality signal Q. Part (c) of FIG. 2 shows the average 
value over the n intervals according to the norm Lpz for 
P2=l of the values Lpi(Di), Li(Di) and L»(Di), respectively 
designated by a short horizontal dash 24, by a long 
15 horizontal dash 25 and by a dot 26. The value of Q as 

designated by dash 25, and therefore obtained via a 2-step 
averaging with p-values pi=P2=l, substantially corresponds, 

: to the value obtained by way of the known singular time 

averaging wherein the Li norm is applied. This signifies 
that the, improvement of the correlation envisaged by the 

T invention may be achieved only if Pi>P2- 

^ " If "it is simple in the first substep to determine the 

maximum of the signal parts Di(t) in each interval Ti, 
e.g., pi=_ is chosen. In the second substep, the choice 
25 of P2=l is the most simple one. 

It should be understood that, when using such a 2- 
step time averaging, the effect of brief disturbances on 
the eventual quality signal continues to be significant. 
For test signals on spoken words, a total time duration 
30 Ttot of approx. 10 s is indicative, it being possible to 

assume, for Tint, the average duration of a spoken 
syllable, i.e., approx. 0.3 s. 

Apart from variation of the p-value, particularly in 
the first substep, the effect of brief disturbances may 
35 also be manipulated by a suitable choice of the duration 

of the time interval Ti, e.g., as a function of the kind 
of signal, e.g., spoken words or music, or of the kind of 
signal, slow or fast, but also as a function of the type 
of audio or voice signals-processing and/or -transporting 
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system of which X(t) is the output signal. it has already 
been mentioned above that, in the event of a test signal 
with spoken words, the average duration of a syllable is 
approx. 0.3 s. Said average, however, may vary 
considerably in the event of sentences pronounced 
deliberately slow or fast, as the case may be. Something 
similar may apply to musical signals having a slow or fast 
rhythm, as the case may be. 

Another option of manipulating the effect of brief 
disturbances is by choosing the intervals overlapping, as 
a result of which the effect of brief disturbances, which 
are present exactly on the interval boundaries, are better 
taken into account. Such an overlap is, e.g., 10%, the 
next interval Ti+i beginning at 0.9 of the interval Ti, or 
also 50%, the next interval Ti^i already beginning halfway 
through the interval Ti. 

When listening to a sound signal, the part of the 
sound signal heard most recently generally has a greater 
effect on the quality perception than the first-heard part 
thereof. To have such an effect better expressed in the 
quality signal, too, in the second substep a weighed 
average may be applied by making use of a weighing 
function w(t), whether discrete or not, such as a monotone 
increasing, at any rate not decreasing, function having 
values between 0 and 1 over the total signal duration Ttot, 
for which, e.g., there applies: 
0<w(t)^V2 for t<V2Ttot, and 
V2^w(t)<l for V2Ttot^t<Ttot, 
there being allocated, to each interval Ti, a weight wi 
which is equal to, e.g., the maximum of w(t) in the 
interval Ti. In this connection, the norm function of 
formula {2} is adjusted to: 



^MH\f\\ = 



f n 



{2' } 
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The time-averaging device 17, as schematically shown 
in FIG. 3, according to the invention consists of two 
averaging members 31 and 32. A first averaging member 31 
receives, by way of the signal coupling 16, the 
disturbance signal D(t) from the combining device 15, and 
processes said received signal according to the first 
substep described above. In it, the signal D(t) is first 
broken down over n intervals Ti with i=l,-,n of the total 
signal duration Ttot of the signal D(t), into n subsignals 
Di(t), which are subsequently converted into a time- 
sequential row of time-averaged signal values Lpi(Di), 
determined per time interval Ti using an Lp norm having 
the relatively high p-value pi. Said row of signal values 
Lpi(Di) is passed on, by way of a signal coupling 33, to 
the second averaging member 32. The second averaging 
member determines, of said row of average signal values 
JLpi(DiO^, an average signal value Lp2(Lpi(D)) according to an 
.Lp norm having a relatively low p-value p2 according to 
formula {2} or {2'}. The average signal value Lp2(Lpi(D)} 
is subsequently delivered, by the second averaging member 
-32, as the quality signal Q determined, to the signal 
outputs 18 of the time-averaging device. 
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F. CLAIMS 

1. Method for determining^ according to an objective 
measurement technique, the quality of an output signal of 

5 a signal-processing system with respect to a reference 

signal, which method comprises the following main steps: 

- processing the output signal and generating a first 
representation signal, 

- processing the reference signal and generating a 
10 second representation signal, and 

- combining the first and the second representation 
signals to form a time-independent quality signal, 

the main step of combining comprising the following steps: 
determining a differential signal as a function of 
15 time, and 

averaging the disturbance signal over time, and 
generating the time-independent quality signal, 

characterised in that 

the step of averaging over time comprises: 
20 - a first substep of determining, in each time interval 

of a series of consecutive time intervals over the 
time duration of the differential signal, first 
signal averages of the disturbance signal according 
to a first averaging function, and 
25 - a second substep of determining, over said time 

duration, a second signal average from the first 
signal averages according to a second averaging 
function different from the first averaging function, 
the quality signal enclosing the second signal 
30 average. 

2. Method according to claim 1, characterised in that 
the first averaging function is an averaging function in 
which higher signal values are more dominantly present in 

35 the averaging result than in the second averaging 

f unction - 

3. Method according to claim 2, characterised in that 
the first and second averaging functions are functions 
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according to Lebesgue p-averaging Lp having mutually 
differing p-values pi and p2, respectively, wherein pl>p2. 

4. Method according to claim 3,. characterised in that 
5 pl=oo and p2=l. 

5. Method according to any of the preceding claims 1-4, 
characterised in that the time intervals in the first 
substep are intervals overlapping one another. 

10 

6. Method according to any of the preceding claims 1-5, 
characterised in that the second averaging function of the 
second substep comprises a weighed averaging. 

15 7. Device for determining, according to an objective 

measurement technique, the quality of an output signal of 
a signal-processing system with respect to a. reference, 
signal, which device comprises: 

a first signal-processing device for processing the 

10 output signal and generating a first representation 

I signal, 

- a. -second signal-processing device for processing the 
reference signal and generating a second 
representation signal, and 
25 « a combination circuit for combining the first and the 

second representation signals, and generating a time- 
independent quality signal, 
which combination circuit comprises a differential device 
for determining a differential signal as a function of 
30 time, and a time-averaging device for generating the time- 

independent quality signal, 
characterised in that 

the time-averaging device comprises a first averaging 
member for determining, in each time interval of a series 
35 of consecutive time intervals over the signal duration of 

the differential signal, first signal averages of the 
reference signal according to a first averaging function, 
and a second averaging member for determining, for said 
time duration, a second signal average from the first 
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Signal averages according to a second averaging function 
differing from the second averaging function, the quality 
signal enclosing the second signal average. 

8. Device according to claim 7, characterised in that 
the first and the second averaging members are arranged 
for carrying out averaging functions according to a 
Lesbesgue p-averaging Lp having mutually differing powers 
pi and p2, respectively, wherein pl>p2. 

9. Device according to claim 8, characterised in that 
pl=oo and p2=l. 
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ABSTRACT 

This paper presents a method to monitor audio and video 
quality in the context of a digital broadcasting network. 
Because of bandwidth limitations, most existing methods 
cannot be applied. A method based on the evaluation of 
several of the impairments typically encountered m digital 
MPEG audio and video signals is proposed. The method 
has also been implemented in real time. Performance results 
in various conditions using the prototype device are 
presented, as well as various applications. 

1 INTRODUCTION 

The quality of service that is provided to the end-user is a 
very important performance criterion in television. In 
digital broadcasting, each equipment may introduce 
impairments that win affect the perceived audio and video 
quality. However, the impact on quaHty cannot be easily 
predicted only from the encoded bitstceam impairments, 
because the contait and the receiver may have a strong 
influence. Then, parameters such as BER (Bit Error Rate) 
aie inadequate to monitor quality ; decoded picture and 
sound quality must be monitored directly. 
The scope of this paper is the design and test of a real time 
working method for quaUty monitoring in the context of 
digital television networics. Section 2 is a state of the art of 
objective image and audio quaHty assessment methods. An 
approach that fulfils die teduncal constraints of the 
application to a real broadcasting network is chosen and 
outlined in section 3. Section 4 details the principles of the 
proposed audio and image quality assessment techniques 
that have been developed. Section 5 gives an overview of 
the tests results and of the possible applications. 

2 STATEOFTHEART 

Many image and sound quaHty assessment methods have 
been developed last years. Traditional objective (i.e. 
automatic) measurement methods such as SNR (Signal to 



Noise Ratio), or THD (Total Harmonic Distortion) for 
audio cannot reHably evaluate perceived quaHty of digital 
signals (e.g. MPEG-2) [1]. 

Most quaHty assessment methods rely on the measurement 
of distortions on the signal between the input and die output 
of a system. This configuration requires to transmit some 
information about die input (reference) signals to the distant 
measurement point in a broadcasting network. Therefore, 
die amount of reference information should be nunimized. 
Three classes can be defined [2]. 

The first class is based on the comparison of die full 
reference and impaired pictures or audio signals. These 
techniques simulate the human hearing or visuaHsation 
processes using die main psychoacoustical or psychovisual 
models [1][3]. The distorsion mesurement is earned out 
between these internal representations. Several metrics 
using theses prijidples have been developed and provide 
high correlated results widi subjective quaHty, but at die 
expense of the processing coffq?lexity and die amount of 
reference information (several Mbit/s for video). 

The second class uses a reduced reference approach. The 
distorsion is measured between parameters calculated on 
die rrference and die impaired signal. A Hnear combination 
of die parameters usually provides the final objective 
evaluation. The combination model is adjusted to maximise 
correlation with subjective tests results. Few audio quahty 
techniques exist in diat class. The ITS (Institute of 
Telecommunication Sciences) metric is a weU-known 
image quaHty assessment approach of that kind [41. The 
techniques in diis class are less efficient, but require less 
processing and can be tuned to specific impairments. 

The last class includes techniques tiiat do not need any 
reference signal. Impairments detection is based on dieir 
main spatial, frequency or time characteristics. Leammg 
algoridims are often used. For MPEG pictures, much woric 
has focused on blocking artefacts. Some audio quahty 
assessment techniques have also been developed such as 
OBQ (Output-Based objective QuaHty) [5]. Woric has to be 
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done to obtain results that are better correlated with 
subjective data. 

3 TECHNICAL APPROACH 

The practical issues linked to a broadcastmg network for 
the transmission of the reference information makes the 
reduced reference approach in Figure 1 well suited, as well 
as the need for a sufficient correlation with the subjective 
evaluation. 



eference signals 



\^ Oistuitxance 



Digital teJevision system 
(Broadcasting netwofk) 



Processing of 
Input parameters 



Impaired signals 

T ^ 

Quajtty 



PiDcessing of 
output parameters and 
impatnmem features 



Figure 1: Reduced reference, comparative approach 

ITie input parameters are transmitted in-band with the 
digital TV programs, in a dedicated quality of service 
channel multiplexed into the MPEG-2 Transport Stream. 
The bit-rate required to transmit the parameters has to be in 
the order of a few kbits/s, in order to be affordable. In this 
way, the parameters are easily broadcast to all final control 
measurement points. 

The operation of input and ou^ut equipment is 
synchronous, to cany out the comparison on the 
parameters. The system labels aU measurements with 
specific time stamps, widch are ptcseat in the MPEG-2 
stream. 

4 QUALITY ASSESSMENT METHOD 

The synoptic of the quality assessment method is depicted 
in Figure 2. In the first step, impairment features are 
extracted. They seek to represent typical encoding and 
transmission errors impairments separately for audio and 
video. These measurements are obtained by comparing a set 
of parameters that represent the contents of impaired 
(ou^ut) signals with the same set of parameters extracted 
from the input refonence signals: 

Impaired Audio- Reference 
Video sequence 




parameters Impaimient 
^ Features 




Video Analyser 


> 

> 


Impaimnent 
combination 
model 


Aucfio Analyser 


> 

> 



Quality 
pMOS 

Figure 2: Ptocessing of inq>airment features and global quality. 
The second step makes a combination of audio and video 
impairment features with a non linear model. This model 
has been optimized to predict a global perceived quality 
continuously, based on SSCQE (Single Stimulus 
Continuous Quality Evaluation) type, subjective evaluation 
data PTU-R Rec Bl 500-9]. 

4.1 Audio 

Two main types of audio impairments have to be detected. 
One parameter has been developed to detect each, 
In^ainnents resulting from transmission or network errors 



are represented by the first one. This type of impairments 
may introduce loss of samples, various distortions and 
strong impulsive noises. It may occur when the error 
correcting capabilities of the stream are exceeded. 
Impairments may be inaudible but die audio signal may 
also be cut or completely inaudible, and depends on the 
element of the MPEG frame that has been corrupted. The 
SDR (Strong Degradation Rate) represents the apparition 
rate of this type of audio impairments. 

The second parameter is the PD (Perceptual Distance). It 
represents coding impairments due to excessive information 
removal when reducing the audio bit-rate. ITiis perceptual 
distance has been developed to assess the subjective impact 
of the impairments on the global audio quality, by taking 
into account the main psychoacoustical transformations that 
model the human hearing process: time and fi^uency 
spreading and masking, critical bands.,. The impairment 
measurement is based on the comparison of the 
psychoacoustical repres^tations of the reference and 
impaired audio signals. The MOS representing the 
predicted subjective audio quality of the digital audio signal 
is then obtain using a mapping function. 

4J2 Video 

Usual encoding impairments include blur, ringing, blocking 
effects, false edges. Transmission errors generate more 
visible effects, fix>m empty or misplaced macroblocs to 
black or frozen pictures. Each of these in^)airments affect 
picture contents in a specific way. Based on this 
assumption, a set of paramet^ that seek to represent image 
lost or added contents in spatial and temporal dimensions 
has been defined, with computatiormaly efGd^t 
^goritiuns. These content parameters are based on a local 
linear transform. This approach allows to define a local 
contrast, that lies at a mid point between a very global 
metric like die SNR and a very local metric like the 
maximum (L«) in which only one pixel represents die whole 
picture. 

The impairment features are based on the comparison of the 
set of parameters for the reference and the impaired 
sequence. The definition of each feature has been optimised 
to match subjective evaluation. Four features have finally 
been selected: UR^SA, LR.TA, BM, DM, respectively 
featuring the loss of spatial activity, temporal activity, 
blocking effect, and mean luminance value changes. The 
response of several of tiiese features to transmisaon 
mripairments has been given in a previous paper [6]. 

43 Perceptual qualify prediction model 

Perceived quaUty is strongly influenced by audio and video 
impairments. In order to obtain a global quality evaluation, 
the merging of the impainnent features is necessary. Figure 
3 shows the relationship between impairments and the 
subjective SSCQE quaUty (MOS). The preceived quaHty is 
impacted when one, several or all impairment features have 
a response. 
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Figure 3: Relationship between subjective evaluation (MOS) and 
the objective impainnent features. 

A non linear combination model is used to predict quality 
(pMOS), from the audio and video impairment features. A 
training phase allows to set up the model, and to take the 
interactions between impainnent features into account. In 
the prediction phase, the combined relevance of the features 
and the model allows to predict quality. 

5 TEST RESULTS 

5*1 Simuladon and implementation tests 

This section gives some results of simulations that were 
conducted to check for the relevance of the quality 
assessment method. Figure 4 synthesizes 3 minutes of the 
results obtained for a 30 minutes sequence extracted ftom a 

- commercial program, and transmitted on a satellite link 
with QPSK modulation. Globally, tiie mediod provides a 
predicted quality (pMOS) that matches the subjective 

' evaluation (MOS). Some significant eirors arise, 

- particularly for intermediate MOS quality values, whereas 
higher and lower levels are better predicted. This effect 
mainly originates from die fact that intennediate level 
values are often a transient position of the observers' 
response to a strong impairment. 




OOo03:Qac12 00:a3;:30:12 



OOMaOOtia 00004:3(1:12 QOflS3QO:12 0006:30:12 
Time Code 



Figure 4: SSCQE MOS quaUty and objective predicted pMOS for 
a 3 inimitejg long sequence with satellite transmission errors. 

A more extensive test has been made using audio-video 
material composed of various content, in order for the range 
of impairments to be as exhaustive as possible. It includes 
magazine, news as well as more critical program like 
sports, making a total of a 30 minutes sequence, coded at a 
5Mbit/s constant bit rate. This coded sequence is 
transmitted using hardware in several conditions and 
broadcasting channels (cable, satellite, tmestnal). In this 



test, the results are representflHi a discrete quality scale 
with five levels. A discrete scal^as been found to be more 
useful in the context of network monitoring, because of its 
conciseness. Additionnaly, quantizing the pMOS and MOS 
values makes the correlation diagrams like Hgure 5 more 
clear. 

Figure 5 gives the quality results for a sequence that wasn't 
used during the training phase of the method. For each 
quality level, the standard deviations for the subjective 
MOS and objective pMOS evaluations is given, as well as 
tiieir ratio. The optimal ratio is 1, which represents the case 
that the error made by the objective model is not bigger 
than the uncertainty on the subjective MOS. 




0 10 203040S0 60 70 80 90 100 

predicted MOS (pMOS) 

Figure 5: Results on unknown sequence of the method using 4 
video impainnent features and the audio feature. 

Bad quality pictures are correctly evaluated, which is a very 
valuable result for real network monitoring. On the whole, 
the decrease of the linear correlation coefficient for the 
sequence used during the training phase and the one used 
sequence is quite limited, from 0.75 to 0.71. 

We also observed that the video signal is impaired before 
the audio one when the transmisdon perturbation increases. 
This can be explained by the fact that the audio bitstream 
bitrate is much lower and as such, it has a lower probability 
of being impaired. Also, the prediction error is increased, as 
^lown by the standard deviation ratio. 

S2, Inqdementation 

The complete system has been implemented in a prototype 
device mainly composed of a PC and two DSP boards for 
an efficient processing of the impairment features. It allows 
to predict quality in real-time, based on the analysis of the 
video sequences. The resulting impairment features and the 
predicted quality are displayed instantaneously at the 
measurement point, and can also be sent to a supervision 
system to give a global view of the Quality of Service over 
the digital television distdbution network. 
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53 Applications 

Three main appUcations are rmmiil^rrently. One is about 
system improvement and otbeis about network monitoring. 
Jn order to check the influence of the IRD (Integrated 
Receiver Decoder) on the perceived quality, the same 
mi^ired bitstream has been decoded by two different 
profesaonal IRDs for several minutes. In order to give a 
veiy synthetic view of the results, the five levels discrete 
quality scale has been used and the proportion of each level 
is displayed m Figure 6. IRDl cleariy appears to provide a 
lower quality, as we have observed visually more frozen or 
black ftames. This may originate from the receivers' 
l^onnance or the internal error masking strategy of each 
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Kgurc 6: Long-tenn integration of the predicted pMOS quality 
tevek for two receivers-decoders. Oevd 5 = best qualSy). 

^1!?*^ aCTKcation consists in coveting service area 
detennmabon. TTie quality assessment device has been used 
in a mobile measurement vehicle. An Entrance Control 
measurement point is placed close at the transmitter to 
compute and insert die reduced reference, and the Final 
control measurement point is riding around the area. This 
approach is very effective to estobKsh service breakpoint 
area. As an example, Kgure 7 resumes measm^ent 
campaign estabhshed around a DVB-T transmitter 




Figure 7: Biealqxiints in covering area map. 



The main applicatioi^^twork monitoring. The quality 
momtonng device ha^^ tested on various broadLting 
DVB networks. Several network points were monitored To 
ttis end. the measurement devices are remote controUed 
arough the SNMP protocol. These tests have confirm^ 
that the comparative approach is weU adapted since it 
informs on any perceptible distortion in the network. Thus 
It IS useful for a broadcaster to reserve a little bandwidth in* 
to dispatch a bitsream containg reduced reference 

6 CONCLUSION 

The reduwd reference approach is a good compromise 
^utton for real-time audio-video monitoring in a 
hroadcastmg network. Simulation results have validated the 
method. However, the influence of audio in the objective 
quality evatoation does not appear cleariy. mainly because 
au&o IS «atisticaDy less impaired than video, due to the 
tniiate difference m a TV program. 

In practice, the field of network monitoring requires verv 
concise and simple quality indicators. We use five grade 
quahty scale, and long-term integration in order to have an 
overview of a given sequence quality. Several field trials on 
teal networks have shown the relevance of audio and video 
quality assessment for network nKmitoring. 

Fumre work will investigate the subjectively relevant audio 
and video mteiactions to provide a long-term audio-video 
measurement 
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